AITopics | chronicling america

Collaborating Authors

chronicling america

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Model Details

Neural Information Processing SystemsFeb-18-2026, 04:22:50 GMT

We decreased the confidence threshold to 0.1 to increase article and headline The following specifications were used: { resolution: 256, learning rate: 2e-3 }. This limit is binding for common words, e.g., "the". The recognizer is trained using the Supervised Contrastive ("SupCon") loss function [7], a gener-45 In particular, we work with the "outside" SupCon loss formulation We use a MobileNetV3 (Small) encoder pre-trained on ImageNet1k sourced from the timm [19] We use 0.1 as the temperature for Center Cropping, to avoid destroying too much information. C (Small) model that is developed in [2] for character recognition. If multiple article bounding boxes satisfy these rules for a given headline, then we take the highest.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Industry:

Law (1.00)
Information Technology (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

ffeb860479ccae44d84c0de32acd693d-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-18-2026, 04:22:47 GMT

american story, chronicling america, dataset, (15 more...)

Neural Information Processing Systems

Country:

North America > Panama (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(7 more...)

Industry:

Law (1.00)
Information Technology (1.00)
Government > Regional Government (0.46)
Media > News (0.32)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)

Add feedback

ffeb860479ccae44d84c0de32acd693d-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsOct-10-2025, 23:56:01 GMT

data mining, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > Panama (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(8 more...)

Industry:

Law (1.00)
Information Technology (1.00)
Government > Regional Government (0.46)
Media > News (0.32)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(2 more...)

Add feedback

ChroniclingAmericaQA: A Large-scale Question Answering Dataset based on Historical American Newspaper Pages

Piryani, Bhawna, Mozafari, Jamshid, Jatowt, Adam

arXiv.org Artificial IntelligenceMay-10-2024

Question answering (QA) and Machine Reading Comprehension (MRC) tasks have significantly advanced in recent years due to the rapid development of deep learning techniques and, more recently, large language models. At the same time, many benchmark datasets have become available for QA and MRC tasks. However, most existing large-scale benchmark datasets have been created predominantly using synchronous document collections like Wikipedia or the Web. Archival document collections, such as historical newspapers, contain valuable information from the past that is still not widely used to train large language models. To further contribute to advancing QA and MRC tasks and to overcome the limitation of previous datasets, we introduce ChroniclingAmericaQA, a large-scale temporal QA dataset with 487K question-answer pairs created based on the historical newspaper collection Chronicling America. Our dataset is constructed from a subset of the Chronicling America newspaper collection spanning 120 years. One of the significant challenges for utilizing digitized historical newspaper collections is the low quality of OCR text. Therefore, to enable realistic testing of QA models, our dataset can be used in three different ways: answering questions from raw and noisy content, answering questions from cleaner, corrected version of the content, as well as answering questions from scanned images of newspaper pages. This and the fact that ChroniclingAmericaQA spans the longest time period among available QA datasets make it quite a unique and useful resource.

chroniclingamericaqa, computational linguistic, dataset, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3626772.3657891

2403.17859

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(61 more...)

Genre: Research Report (0.82)

Industry:

Media > News (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers

Dell, Melissa, Carlson, Jacob, Bryan, Tom, Silcock, Emily, Arora, Abhishek, Shen, Zejiang, D'Amico-Wong, Luca, Le, Quan, Querubin, Pablo, Heldring, Leander

arXiv.org Artificial IntelligenceAug-23-2023

Existing full text datasets of U.S. public domain newspapers do not recognize the often complex layouts of newspaper scans, and as a result the digitized content scrambles texts from articles, headlines, captions, advertisements, and other layout regions. OCR quality can also be low. This study develops a novel, deep learning pipeline for extracting full article texts from newspaper images and applies it to the nearly 20 million scans in Library of Congress's public domain Chronicling America collection. The pipeline includes layout detection, legibility classification, custom OCR, and association of article texts spanning multiple bounding boxes. To achieve high scalability, it is built with efficient architectures designed for mobile phones. The resulting American Stories dataset provides high quality data that could be used for pre-training a large language model to achieve better understanding of historical English and historical world knowledge. The dataset could also be added to the external database of a retrieval-augmented language model to make historical information - ranging from interpretations of political events to minutiae about the lives of people's ancestors - more widely accessible. Furthermore, structured article texts facilitate using transformer-based methods for popular social science applications like topic classification, detection of reproduced content, and news story clustering. Finally, American Stories provides a massive silver quality dataset for innovating multimodal layout analysis models and other multimodal applications.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2308.12477

Country:

North America > Panama (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(8 more...)

Genre: Research Report (0.64)

Industry:

Media > News (1.00)
Law (1.00)
Information Technology (1.00)
Government > Regional Government > North America Government > United States Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)

Add feedback

How To Search Historical Newspaper Images Using Artificial Intelligence

#artificialintelligenceDec-16-2020, 19:34:20 GMT

Teachers and students (or anyone else in the public, for that matter) can now explore more than 1.5 million historical newspaper images online using artificial intelligence. The latest machine learning experience from LC Labs, Newspaper Navigator allows users to search visual content in American newspapers dating from 1789-1963. The user begins by entering a keyword that returns a selection of photos. Then the user can choose photos to search against, allowing the discovery of related images that were previously undetectable by search engines. For decades, partners across the United States have collaborated to digitize newspapers through the Library's Chronicling America website, a database of historical U.S. newspapers.

library, newspaper, newspaper navigator, (12 more...)

#artificialintelligence

Country: North America > United States (0.56)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Applied AI (0.61)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Newspaper Navigator

University of Washington Computer ScienceMay-6-2020, 21:24:28 GMT

Welcome to the Newspaper Navigator dataset! This dataset consists of extracted visual content for 16,358,041 historic newspaper pages in Chronicling America. The visual content was identified using an object detection model trained on annotations of World War 1-era Chronicling America pages, including annotations made by volunteers as part of the Beyond Words crowdsourcing project. The dataset also includes text corresponding to the visual content, identified by extracting the Optical Character Recognition, or OCR, within each predicted bounding box. For example, if the visual content recognition model predicted a bounding box around a headline, the corresponding textual content provides a machine-readable version of the headline; likewise, for a photograph, illustration, or map, this textual representation often contains the title and caption.

artificial intelligence, dataset, optical character recognition, (14 more...)

University of Washington Computer Science

Industry: Media > News (0.91)

Technology: Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.55)

Add feedback